Take-home Exercise 2

This take-home exercise aims to perform spatial point patterns analysis of Airbnb listings in Singapore.

Genice Goh true
09-25-2021

1.0 Introduction

Singapore is one of the few global cities that has yet to legalise short-term rentals offered by platforms such as Airbnb. However, it is interesting to note that there are data sets for Singapore available in Inside Airbnb - an independent, non-commercial set of tools and data that allows the exploration of how Airbnb is being used in cities around the world.

As such, we will like to use these data sets to analyse the following:

2.0 The Data

The data sets used for this analysis include:

3.0 Installing and Loading Packages

The packages used for this analysis include:

packages = c('sf', 'spatstat', 'raster', 'maptools', 'tmap', 'tidyverse')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

4.0 Importing Geospatial Data

st_read() of sf package is used to import the geospatial data, which is in shapefile format.

mrt_sf <- st_read(dsn="data/geospatial",
               layer="MRTLRTStnPtt")
Reading layer `MRTLRTStnPtt' from data source 
  `C:\Users\user\Desktop\SMU\Y4S1\IS415\geniceee\IS415_blog\_posts\2021-09-25-take-home-exercise-2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 185 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6138.311 ymin: 27555.06 xmax: 45254.86 ymax: 47854.2
Projected CRS: SVY21
sg_sf <- st_read(dsn="data/geospatial",
               layer="CostalOutline")
Reading layer `CostalOutline' from data source 
  `C:\Users\user\Desktop\SMU\Y4S1\IS415\geniceee\IS415_blog\_posts\2021-09-25-take-home-exercise-2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 60 features and 4 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 2663.926 ymin: 16357.98 xmax: 56047.79 ymax: 50244.03
Projected CRS: SVY21
mpsz_sf <- st_read(dsn="data/geospatial",
               layer="MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `C:\Users\user\Desktop\SMU\Y4S1\IS415\geniceee\IS415_blog\_posts\2021-09-25-take-home-exercise-2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21

From the output message, we can see that:

4.1 Data Preprocessing

Before we visualise the geospatial data, we will need to conduct data preprocessing to ensure that there are no invalid geometries and missing values.

4.1.1 Invalid Geometries

length(which(st_is_valid(mrt_sf) == FALSE))
[1] 0
length(which(st_is_valid(sg_sf) == FALSE))
[1] 1
length(which(st_is_valid(mpsz_sf) == FALSE))
[1] 9

There are no invalid geometries in the mrt_sf data frame while the sg_sf data frame and mpsz_sf data frame contains 1 and 9 invalid geometries respectively. We will now proceed to remove the invalid geometries in the sg_sf and mpsz_sf data frames.

sg_sf <- st_make_valid(sg_sf)
mpsz_sf <- st_make_valid(mpsz_sf)

length(which(st_is_valid(sg_sf) == FALSE))
[1] 0
length(which(st_is_valid(mpsz_sf) == FALSE))
[1] 0

From the output message, we can observe that there are no longer any invalid geometries in the sg_sf and mpsz_sf data frames!

4.1.2 Missing Values

mrt_sf[rowSums(is.na(mrt_sf))!=0,]
Simple feature collection with 0 features and 3 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO   geometry
<0 rows> (or 0-length row.names)
mrt_sf[rowSums(is.na(sg_sf))!=0,]
Simple feature collection with 0 features and 3 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO   geometry
<0 rows> (or 0-length row.names)
mrt_sf[rowSums(is.na(mpsz_sf))!=0,]
Simple feature collection with 0 features and 3 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO   geometry
<0 rows> (or 0-length row.names)

We can see that there are no missing values in all 3 sf data frames.

4.2 Verify Coordinate Reference System

We will first need to retrieve the coordinate reference system for verification. st_crs() of sf package is used to do this.

st_crs(mrt_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(sg_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
st_crs(mpsz_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

From the output messages, we can observe that the EPSG code for all 3 data frames is currently 9001. This is wrong because the EPSG code of projection coordinate system SVY21 is supposed to be 3414, instead of 9001. st_set_crs() of sf package is therefore used to assign the correct EPSG code for the data frames.

mrt_sf <- st_set_crs(mrt_sf, 3414)
sg_sf <- st_set_crs(sg_sf, 3414)
mpsz_sf <- st_set_crs(mpsz_sf, 3414)

We will now use st_crs() of sf package to retrieve the coordinate reference system again.

st_crs(mrt_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(sg_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(mpsz_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

The EPSG code is now correctly assigned for all 3 sf data frames!!

4.3 Visualising Geospatial Data

It is useful to plot a map to visualise the geospatial data, so that we can easily get a preliminary look at the spatial patterns.

tmap_mode('view')

tm_shape(mrt_sf) +
  tm_dots() 

5.0 Importing Aspatial Data

Since the aspatial datasets are all in CSV format, read_csv() of readr package is used to import them. The outputs are tibble data frames.

It is also important to understand the data that we are working with. glimpse() of dplyr package is therefore used to perform exploratory data analysis.

airbnb2019 <- read_csv("data/aspatial/listings_300619.csv")
airbnb2021 <- read_csv("data/aspatial/listings_290621.csv")
hotels <- read_csv("data/aspatial/hotels.csv")
tourism <- read_csv("data/aspatial/tourism.csv")
glimpse(airbnb2019)
Rows: 8,293
Columns: 16
$ id                             <dbl> 49091, 50646, 56334, 71609, 7~
$ name                           <chr> "COZICOMFORT LONG TERM STAY R~
$ host_id                        <dbl> 266763, 227796, 266763, 36704~
$ host_name                      <chr> "Francesca", "Sujatha", "Fran~
$ neighbourhood_group            <chr> "North Region", "Central Regi~
$ neighbourhood                  <chr> "Woodlands", "Bukit Timah", "~
$ latitude                       <dbl> 1.44255, 1.33235, 1.44246, 1.~
$ longitude                      <dbl> 103.7958, 103.7852, 103.7967,~
$ room_type                      <chr> "Private room", "Private room~
$ price                          <dbl> 81, 80, 68, 200, 92, 102, 203~
$ minimum_nights                 <dbl> 180, 90, 6, 1, 1, 1, 1, 7, 30~
$ number_of_reviews              <dbl> 1, 18, 20, 12, 20, 35, 23, 15~
$ last_review                    <date> 2013-10-21, 2014-12-26, 2015~
$ reviews_per_month              <dbl> 0.01, 0.28, 0.21, 0.13, 0.21,~
$ calculated_host_listings_count <dbl> 2, 1, 2, 9, 9, 9, 9, 1, 4, 4,~
$ availability_365               <dbl> 365, 365, 365, 353, 353, 348,~
glimpse(airbnb2021) 
Rows: 4,238
Columns: 16
$ id                             <dbl> 49091, 50646, 56334, 71609, 7~
$ name                           <chr> "COZICOMFORT LONG TERM STAY R~
$ host_id                        <dbl> 266763, 227796, 266763, 36704~
$ host_name                      <chr> "Francesca", "Sujatha", "Fran~
$ neighbourhood_group            <chr> "North Region", "Central Regi~
$ neighbourhood                  <chr> "Woodlands", "Bukit Timah", "~
$ latitude                       <dbl> 1.44129, 1.33432, 1.44041, 1.~
$ longitude                      <dbl> 103.7957, 103.7852, 103.7967,~
$ room_type                      <chr> "Private room", "Private room~
$ price                          <dbl> 81, 80, 67, 177, 81, 81, 52, ~
$ minimum_nights                 <dbl> 180, 90, 6, 90, 90, 90, 14, 1~
$ number_of_reviews              <dbl> 1, 18, 20, 20, 24, 48, 20, 13~
$ last_review                    <date> 2013-10-21, 2014-07-08, 2015~
$ reviews_per_month              <dbl> 0.01, 0.22, 0.16, 0.29, 0.34,~
$ calculated_host_listings_count <dbl> 2, 1, 2, 4, 4, 4, 50, 50, 7, ~
$ availability_365               <dbl> 365, 365, 365, 365, 365, 365,~
glimpse(hotels)
Rows: 422
Columns: 9
$ NAME              <chr> "Jayleen Clarke Quay Hotel", "JEN Singapor~
$ ADDRESSPOSTALCODE <dbl> 59390, 238858, 249716, 399041, 238485, 399~
$ ADDRESSSTREETNAME <chr> "25 New Bridge Road", "277 Orchard Road , ~
$ HYPERLINK         <chr> "jayleenclarkequay@gmail.com", "singaporeo~
$ TOTALROOMS        <dbl> 20, 499, 565, 42, 81, 33, 17, 634, 56, 451~
$ KEEPERNAME        <chr> "James Federick Chong Kah Yean", "Kuok Oon~
$ Lat               <dbl> 1.288715, 1.300510, 1.304271, 1.311586, 1.~
$ Lng               <dbl> 103.8475, 103.8392, 103.8239, 103.8775, 10~
$ ICON_NAME         <chr> "hotel.gif", "hotel.gif", "hotel.gif", "ho~
glimpse(tourism) 
Rows: 107
Columns: 17
$ NAME              <chr> "Chinatown Heritage Centre, Singapore", "T~
$ DESCRIPTION       <chr> "Experience how Singapore’s early Chinese ~
$ ADDRESSSTREETNAME <chr> "48 Pagoda Street", "158 Telok Ayer Street~
$ HYPERLINK         <chr> "http://www.singaporechinatown.com.sg/", "~
$ PHOTOURL          <chr> "www.yoursingapore.com/content/dam/desktop~
$ URL_PATH          <chr> "www.yoursingapore.com/en/see-do-singapore~
$ IMAGE_ALT_TEXT    <chr> "Learn more about local Chinese culture at~
$ PHOTOCREDITS      <chr> "Joel Chua DY", "Joel Chua DY", "Joel Chua~
$ LASTMODIFIED      <dttm> 2015-11-02 02:16:52, 2015-11-02 02:20:57,~
$ LATITUDE          <dbl> 1.283510, 1.280940, 1.310070, 1.277219, 1.~
$ LONGTITUDE        <dbl> 103.8444, 103.8476, 103.8994, 103.8373, 10~
$ META_DESCRIPTION  <chr> "At the Chinatown Heritage Centre, experie~
$ OPENING_HOURS     <chr> "Daily, 9am – 8pm,Last entry at 7pm.*China~
$ Lat               <dbl> 1.283510, 1.280940, 1.310070, 1.277219, 1.~
$ Lng               <dbl> 103.8444, 103.8476, 103.8994, 103.8373, 10~
$ ICON_NAME         <chr> "tourist_spot.gif", "tourist_spot.gif", "t~
$ ADDRESSPOSTALCODE <dbl> NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA,~

From the output messages, we can observe that all 4 data frames have the columns Latitude and Longitude, and the values are in decimal degrees. This implies that they are using the WGS84 referencing system. It is also worthy to note that these 2 columns are in numeric data types.

5.1 Data Preprocessing

Before proceeding further we will need to conduct data preprocessing to ensure that there are no missing values in the columns Latitude and Longitude for all 4 data frames.

We are only looking at these 2 columns because the next step covered in Section 5.2 does not allow for missing values. In addition, the respective datasets will be dropped as part of geospatial data wrangling later in Section 6.0. As such, we do not need to care about NA values outside of the 2 above-mentioned columns.

airbnb2019[is.na(airbnb2019$latitude)!=0,]
# A tibble: 0 x 16
# ... with 16 variables: id <dbl>, name <chr>, host_id <dbl>,
#   host_name <chr>, neighbourhood_group <chr>, neighbourhood <chr>,
#   latitude <dbl>, longitude <dbl>, room_type <chr>, price <dbl>,
#   minimum_nights <dbl>, number_of_reviews <dbl>,
#   last_review <date>, reviews_per_month <dbl>,
#   calculated_host_listings_count <dbl>, availability_365 <dbl>
airbnb2019[is.na(airbnb2019$longitude)!=0,]
# A tibble: 0 x 16
# ... with 16 variables: id <dbl>, name <chr>, host_id <dbl>,
#   host_name <chr>, neighbourhood_group <chr>, neighbourhood <chr>,
#   latitude <dbl>, longitude <dbl>, room_type <chr>, price <dbl>,
#   minimum_nights <dbl>, number_of_reviews <dbl>,
#   last_review <date>, reviews_per_month <dbl>,
#   calculated_host_listings_count <dbl>, availability_365 <dbl>
airbnb2021[is.na(airbnb2021$latitude)!=0,]
# A tibble: 0 x 16
# ... with 16 variables: id <dbl>, name <chr>, host_id <dbl>,
#   host_name <chr>, neighbourhood_group <chr>, neighbourhood <chr>,
#   latitude <dbl>, longitude <dbl>, room_type <chr>, price <dbl>,
#   minimum_nights <dbl>, number_of_reviews <dbl>,
#   last_review <date>, reviews_per_month <dbl>,
#   calculated_host_listings_count <dbl>, availability_365 <dbl>
airbnb2021[is.na(airbnb2021$longitude)!=0,]
# A tibble: 0 x 16
# ... with 16 variables: id <dbl>, name <chr>, host_id <dbl>,
#   host_name <chr>, neighbourhood_group <chr>, neighbourhood <chr>,
#   latitude <dbl>, longitude <dbl>, room_type <chr>, price <dbl>,
#   minimum_nights <dbl>, number_of_reviews <dbl>,
#   last_review <date>, reviews_per_month <dbl>,
#   calculated_host_listings_count <dbl>, availability_365 <dbl>
hotels[is.na(hotels$Lat)!=0,]
# A tibble: 0 x 9
# ... with 9 variables: NAME <chr>, ADDRESSPOSTALCODE <dbl>,
#   ADDRESSSTREETNAME <chr>, HYPERLINK <chr>, TOTALROOMS <dbl>,
#   KEEPERNAME <chr>, Lat <dbl>, Lng <dbl>, ICON_NAME <chr>
hotels[is.na(hotels$Lng)!=0,]
# A tibble: 0 x 9
# ... with 9 variables: NAME <chr>, ADDRESSPOSTALCODE <dbl>,
#   ADDRESSSTREETNAME <chr>, HYPERLINK <chr>, TOTALROOMS <dbl>,
#   KEEPERNAME <chr>, Lat <dbl>, Lng <dbl>, ICON_NAME <chr>
tourism[is.na(tourism$LONGTITUDE)!=0,]
# A tibble: 1 x 17
  NAME   DESCRIPTION   ADDRESSSTREETNAME HYPERLINK PHOTOURL  URL_PATH 
  <chr>  <chr>         <chr>             <chr>     <chr>     <chr>    
1 Cruis~ Watch the Si~ <NA>              <NA>      www.your~ www.your~
# ... with 11 more variables: IMAGE_ALT_TEXT <chr>,
#   PHOTOCREDITS <chr>, LASTMODIFIED <dttm>, LATITUDE <dbl>,
#   LONGTITUDE <dbl>, META_DESCRIPTION <chr>, OPENING_HOURS <chr>,
#   Lat <dbl>, Lng <dbl>, ICON_NAME <chr>, ADDRESSPOSTALCODE <dbl>
tourism[is.na(tourism$LATITUDE)!=0,]
# A tibble: 1 x 17
  NAME   DESCRIPTION   ADDRESSSTREETNAME HYPERLINK PHOTOURL  URL_PATH 
  <chr>  <chr>         <chr>             <chr>     <chr>     <chr>    
1 Cruis~ Watch the Si~ <NA>              <NA>      www.your~ www.your~
# ... with 11 more variables: IMAGE_ALT_TEXT <chr>,
#   PHOTOCREDITS <chr>, LASTMODIFIED <dttm>, LATITUDE <dbl>,
#   LONGTITUDE <dbl>, META_DESCRIPTION <chr>, OPENING_HOURS <chr>,
#   Lat <dbl>, Lng <dbl>, ICON_NAME <chr>, ADDRESSPOSTALCODE <dbl>

We can see that there is a missing value in the Latitude and Longitude columns for the tourism data frame and it belongs to the observation “Cruise from Singapore”. Although “Cruise from Singapore” is considered a tourist attraction, it does not belong to a specific location. We will therefore need to remove it.

tourism <- tourism[!is.na(tourism$LATITUDE),]

We will now use the earlier code chunk to check if the missing value in the tourism data frame is removed.

tourism[is.na(tourism$LONGTITUDE)!=0,]
# A tibble: 0 x 17
# ... with 17 variables: NAME <chr>, DESCRIPTION <chr>,
#   ADDRESSSTREETNAME <chr>, HYPERLINK <chr>, PHOTOURL <chr>,
#   URL_PATH <chr>, IMAGE_ALT_TEXT <chr>, PHOTOCREDITS <chr>,
#   LASTMODIFIED <dttm>, LATITUDE <dbl>, LONGTITUDE <dbl>,
#   META_DESCRIPTION <chr>, OPENING_HOURS <chr>, Lat <dbl>,
#   Lng <dbl>, ICON_NAME <chr>, ADDRESSPOSTALCODE <dbl>
tourism[is.na(tourism$LATITUDE)!=0,]
# A tibble: 0 x 17
# ... with 17 variables: NAME <chr>, DESCRIPTION <chr>,
#   ADDRESSSTREETNAME <chr>, HYPERLINK <chr>, PHOTOURL <chr>,
#   URL_PATH <chr>, IMAGE_ALT_TEXT <chr>, PHOTOCREDITS <chr>,
#   LASTMODIFIED <dttm>, LATITUDE <dbl>, LONGTITUDE <dbl>,
#   META_DESCRIPTION <chr>, OPENING_HOURS <chr>, Lat <dbl>,
#   Lng <dbl>, ICON_NAME <chr>, ADDRESSPOSTALCODE <dbl>

There are no missing values remaining in the tourism data frame!!

5.2 Converting Aspatial to Geospatial Data

Since these datasets will be used for spatial point pattern analysis later, they will need to be converted to sf data frames to assist further geospatial data wrangling in Section 6.0.

st_as_sf() of sf package is used to convert the airbnb2019, airbnb2021, hotels, tourism data frames into sf data frames.

As discovered in the exploratory data analysis performed earlier, the data frames are using the WGS84 referencing system. We will thereby need to assign EPSG code of 4326 before transforming the newly created sf data frames into SVY21 projected coordinated system using st_transform() of sf package.

airbnb2019_sf <- st_as_sf(airbnb2019,
                         coords = c("longitude", "latitude"),
                         crs=4326) %>%
  st_transform(crs=3414)

airbnb2021_sf <- st_as_sf(airbnb2021,
                         coords = c("longitude", "latitude"),
                         crs=4326) %>%
  st_transform(crs=3414)

hotels_sf <- st_as_sf(hotels,
                         coords = c("Lng", "Lat"),
                         crs=4326) %>%
  st_transform(crs=3414)

tourism_sf <- st_as_sf(tourism,
                         coords = c("LONGTITUDE", "LATITUDE"),
                         crs=4326) %>%
  st_transform(crs=3414)

5.3 Visualising Converted Geospatial Data

Similarly, we will want to plot a map to visualise the converted geospatial data to easily get a preliminary look at the spatial patterns.

tmap_mode('view')

tm_shape(airbnb2019_sf) + 
  tm_dots(alpha=0.4,
          col="blue",
          size= 0.02) +
  
tm_shape(hotels_sf) + 
  tm_dots(alpha=0.4,
          col="red",
          size= 0.02) + 
  
tm_shape(tourism_sf) + 
  tm_dots(alpha=0.4,
          col="green",
          size= 0.02)

6.0 Geospatial Data Wrangling

We need to perform geospatial data wrangling before we can do spatial point pattern analysis.

6.1 Converting sf data frames to sp’s Spatial* class

as_Spatial() of sf package is used to convert the data from sf data frame to sp’s Spatial* class.

airbnb2019 <- as_Spatial(airbnb2019_sf)
airbnb2021 <- as_Spatial(airbnb2021_sf)
hotels <- as_Spatial(hotels_sf)
mrt <- as_Spatial(mrt_sf)
mpsz <- as_Spatial(mpsz_sf)
sg <- as_Spatial (sg_sf)
tourism <- as_Spatial(tourism_sf)

6.2 Converting Spatial* classes to generic sp format

spatstat package requires the data to be in ppp object form. To convert Spatial* class to ppp object, the Spatial* class needs to be convert into Spatial object first.

airbnb2019_sp <- as(airbnb2019, "SpatialPoints")
airbnb2021_sp <- as(airbnb2021, "SpatialPoints")
hotels_sp <- as(hotels, "SpatialPoints")
mrt_sp <- as(mrt, "SpatialPoints")
# mpsz_sp <- as(mpsz, "SpatialPolygons")
sg_sp <- as(sg, "SpatialPolygons")
tourism_sp <- as(tourism, "SpatialPoints")

6.3 Converting generic sp format into spatstat’s ppp format

as.ppp() of spatstat package is used to convert the Spatial object into spatstat’s ppp object format.

airbnb2019_ppp <- as(airbnb2019_sp, "ppp")
airbnb2021_ppp <- as(airbnb2021_sp, "ppp")
hotels_ppp <- as(hotels_sp, "ppp")
mrt_ppp <- as(mrt_sp, "ppp")
tourism_ppp <- as(tourism_sp, "ppp")

We can now take a look at the summary statistics of the newly created ppp objects.

summary(airbnb2019_ppp)
Planar point pattern:  8293 points
Average intensity 9.345289e-06 points per square unit

*Pattern contains duplicated points*

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [7215.57, 44098.31] x [25166.35, 49226.35] units
                    (36880 x 24060 units)
Window area = 887399000 square units
summary(airbnb2021_ppp)
Planar point pattern:  4238 points
Average intensity 5.114513e-06 points per square unit

*Pattern contains duplicated points*

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [7406.99, 43337.89] x [25330, 48391.55] units
                    (35930 x 23060 units)
Window area = 828622000 square units
summary(hotels_ppp)
Planar point pattern:  422 points
Average intensity 5.58414e-07 points per square unit

*Pattern contains duplicated points*

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [5939.24, 45334.18] x [25379.44, 44562.4] units
                    (39390 x 19180 units)
Window area = 755712000 square units
summary(mrt_ppp)
Planar point pattern:  185 points
Average intensity 2.32988e-07 points per square unit

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [6138.31, 45254.86] x [27555.06, 47854.2] units
                    (39120 x 20300 units)
Window area = 794032000 square units
summary(tourism_ppp)
Planar point pattern:  106 points
Average intensity 1.328016e-07 points per square unit

*Pattern contains duplicated points*

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [11380.23, 43659.54] x [22869.34, 47596.73] units
                    (32280 x 24730 units)
Window area = 798183000 square units

The output messages warn about the presence of duplicate points in all 5 ppp objects. In spatial point patterns analysis, the presence of duplicates is a significant issue. The statistical methodology used for spatial point patterns processes is based largely on the assumption that the points cannot be coincident. We will thereby need to handle the duplicate points.

6.4 Handling Duplicate Points

We will use jittering to address the duplicate points problem. Jittering adds a small perturbation to the duplicate points so that they do not occupy the exact same space.

airbnb2019_ppp_jit <- rjitter(airbnb2019_ppp, 
                             retry=TRUE, 
                             nsim=1, 
                             drop=TRUE)

airbnb2021_ppp_jit <- rjitter(airbnb2021_ppp, 
                             retry=TRUE, 
                             nsim=1, 
                             drop=TRUE)

hotels_ppp_jit <- rjitter(hotels_ppp, 
                             retry=TRUE, 
                             nsim=1, 
                             drop=TRUE)

mrt_ppp_jit <- rjitter(mrt_ppp, 
                             retry=TRUE, 
                             nsim=1, 
                             drop=TRUE)

tourism_ppp_jit <- rjitter(tourism_ppp, 
                             retry=TRUE, 
                             nsim=1, 
                             drop=TRUE)

We can now check for the presence of duplicate points in the newly created ppp objects.

any(duplicated(airbnb2019_ppp_jit))
[1] FALSE
any(duplicated(airbnb2021_ppp_jit))
[1] FALSE
any(duplicated(hotels_ppp_jit))
[1] FALSE
any(duplicated(mrt_ppp_jit))
[1] FALSE
any(duplicated(tourism_ppp_jit))
[1] FALSE

There are no more duplicate points in the newly created ppp objects!!

6.5 Creating an owin object

It is good practice to confine the analysis within a geographical area while conducting spatial point patterns analysis.

In spatstat, an object called owin is specially designed to represent this polygonal region.

sg_sp is the CoastalOutline of Singapore. This will be the observation window we will be using.

sg_owin <- as(sg_sp, "owin")

plot(sg_owin)

6.6 Combine Spatial Points Events and owin object

We are going to extract the following spatial points events located within Singapore:

airbnb2019_SG_ppp = airbnb2019_ppp_jit[sg_owin]
airbnb2021_SG_ppp = airbnb2021_ppp_jit[sg_owin]
hotels_SG_ppp = hotels_ppp_jit[sg_owin]
mrt_SG_ppp = mrt_ppp_jit[sg_owin]
tourism_SG_ppp = tourism_ppp_jit[sg_owin]

We will now use plot() to visualise the newly derived ppp objects which are constrained within the Singapore boundary.

par(mfrow=c(3,2))

plot(airbnb2019_SG_ppp)
plot(airbnb2021_SG_ppp)
plot(hotels_SG_ppp)
plot(mrt_SG_ppp)
plot(tourism_SG_ppp)

6.7 Convert unit of measurement for ppp objects

We will need to convert the unit of measurement for the ppp objects from meters to kilometers to aid exploratory spatial data analysis covered in the next few sections.

This is because the output kernel density values are expected to be very small, since the default unit of measurement for SVY21 is in meters. As such, the density values are actually computed in “number of points per square meter”.

rescale() of Base R helps us achieve this goal.

airbnb2019_SG_ppp_km <- rescale(airbnb2019_SG_ppp, 1000, "km")
airbnb2021_SG_ppp_km <- rescale(airbnb2021_SG_ppp, 1000, "km")
hotels_SG_ppp_km <- rescale(hotels_SG_ppp, 1000, "km")
mrt_SG_ppp_km <- rescale(mrt_SG_ppp, 1000, "km")
tourism_SG_ppp_km <- rescale(tourism_SG_ppp, 1000, "km")